22 research outputs found

    Summary Management in P2P Systems

    Get PDF
    International audienceSharing huge, massively distributed databases in P2P systems is inherently difficult. As the amount of stored data increases, data localization techniques become no longer suf- ficient. A practical approach is to rely on compact database summaries rather than raw database records, whose access is costly in large P2P systems. In this paper, we consider summaries that are synthetic, multidimensional views with two main virtues. First, they can be directly queried and used to approximately answer a query without exploring the original data. Second, as semantic indexes, they support locating relevant nodes based on data content. Our main contribution is to define a summary model for P2P systems, and the appropriate algorithms for summary management. Our performance evaluation shows that the cost of query routing is minimized, while incurring a low cost of summary maintenance

    Cluster-based Search Technique for P2P Systems

    Get PDF
    We consider network clustering as the way to improve the performance of locating data in unstructured P2P systems. Connectivity-based Distributed node Clustering (CDC), and SCM-based Distributed Clustering (SDC) are two major protocols that allow partitioning a network topology into clusters, based on node connectivity. These protocols focus on the accuracy of the clustering scheme, i.e. using the Scale Coverage Measure (SCM), and its maintenance against node dynamicity. However, they do not propose search techniques that may take advantage of their clustering information. Thus, their proposals have not been evaluated according to the motivation behind. In this work, we propose a new, efficient Cluster-based Search Technique (CBST) for unstructured P2P systems. We use it to validate connectivity-based clustering schemes, according to the trade-off between cost of maintaining clusters, and benefit for query processing. Our experimental results show the efficiency of CBST implemented over the SDC protocol. By simply exploiting clustering features of the underlying network, a query can travel across a large number of nodes with a minimum number of messages. CBST eliminates a large portion of redundant messages, thus avoiding to overload the P2P network

    Design of PeerSum: a Summary Service for P2P Applications

    Get PDF
    International audienceSharing huge databases in distributed systems is inherently difficult. As the amount of stored data increases, data localization techniques become no longer sufficient. A more efficient approach is to rely on compact database summaries rather than raw database records, whose access is costly in large distributed systems. In this paper, we propose PeerSum, a new service for managing summaries over shared data in large P2P and Grid applications. Our summaries are synthetic, multidimensional views with two main virtues. First, they can be directly queried and used to approximately answer a query without exploring the original data. Second, as semantic indexes, they support locating relevant nodes based on data content. Our main contribution is to define a summary model for P2P systems, and the algorithms for summary management. Our performance evaluation shows that the cost of query routing is minimized, while incurring a low cost of summary maintenance

    Data Sharing in P2P Systems

    Get PDF
    To appear in Springer's "Handbook of P2P Networking"In this chapter, we survey P2P data sharing systems. All along, we focus on the evolution from simple file-sharing systems, with limited functionalities, to Peer Data Management Systems (PDMS) that support advanced applications with more sophisticated data management techniques. Advanced P2P applications are dealing with semantically rich data (e.g. XML documents, relational tables), using a high-level SQL-like query language. We start our survey with an overview over the existing P2P network architectures, and the associated routing protocols. Then, we discuss data indexing techniques based on their distribution degree and the semantics they can capture from the underlying data. We also discuss schema management techniques which allow integrating heterogeneous data. We conclude by discussing the techniques proposed for processing complex queries (e.g. range and join queries). Complex query facilities are necessary for advanced applications which require a high level of search expressiveness. This last part shows the lack of querying techniques that allow for an approximate query answering

    Gestion de résumés de données dans les systèmes pair–pair

    Get PDF
    International audienceIn this paper, we propose managing data summaries in unstructured P2P systems. Our summaries are intelligible views with two main virtues. First, they can be directly queried and used to approximately answer a query. Second, as semantic indexes, they support locating relevant nodes based on data content. The performance evaluation of our proposal shows that the cost of query routing is minimized, while incurring a low cost of summary maintenance.Dans ce travail, nous proposons de maintenir des résumés de données dans les systèmes P2P non structurés. Nos résumés sont des vues intelligibles ayant un double avantage en traitement de requête. Ils peuvent soit répondre d'une manière approximative à une requête, soit guider sa propagation vers les pairs pertinents en se basant sur le contenu des données. L'évaluation de performance de notre proposition a montré que le coût de requêtes est largement réduit, sans induire des côuts élevés de maintenance de résumés

    Summary Management in P2P Systems

    Get PDF
    International audienceSharing huge, massively distributed databases in P2P systems is inherently difficult. As the amount of stored data increases, data localization techniques become no longer suf- ficient. A practical approach is to rely on compact database summaries rather than raw database records, whose access is costly in large P2P systems. In this paper, we consider summaries that are synthetic, multidimensional views with two main virtues. First, they can be directly queried and used to approximately answer a query without exploring the original data. Second, as semantic indexes, they support locating relevant nodes based on data content. Our main contribution is to define a summary model for P2P systems, and the appropriate algorithms for summary management. Our performance evaluation shows that the cost of query routing is minimized, while incurring a low cost of summary maintenance

    Peersum : Gestion des résumés de données dans les systèmes P2P

    Get PDF
    Base de Données Avancées (BDA)National audienceSharing huge, massively distributed databases in P2P systems is inherently difficult. As the amount of stored data increases, data localization techniques become no longer sufficient. A practical approach is to rely on compact database summaries rather than raw database records, whose access is costly in large P2P systems. In this paper, we consider summaries that are synthetic, multidimensional views with two main virtues. First, they can be directly queried and used to approximately answer a query without exploring the original data. Second, as semantic indexes, they support locating relevant nodes based on data content. The main contribution of this paper is to define an efficient algorithm for partitioning an unstructured P2P network into domains, in order to optimally distribute summaries in the network. Then, we propose a distributed algorithm for maintaining a summary in a given domain. Our performance evaluation shows that the cost of query routing is minimized, while incurring a low cost of summary maintenance

    PeerSum: a Summary Service for P2P Applications

    Get PDF
    International audienceSharing huge databases in distributed systems is inherently difficult. As the amount of stored data increases, data localization techniques become no longer sufficient. A practical approach is to rely on compact database summaries rather than raw database records, whose access is costly in large distributed systems. In this paper, we propose PeerSum, a new service for managing summaries over shared data in large P2P and Grid applications. Our summaries are synthetic, multidimensional views with two main virtues. First, they can be directly queried and used to approximately answer a query without exploring the original data. Second, as semantic indexes, they support locating relevant nodes based on data content. Our main contribution is to define a summary model for P2P systems, and the algorithms for summary management. Our performance evaluation shows that the cost of query routing is minimized, while incurring a low cost of summary maintenance

    Techniques de localisation et de résumé des données dans les systèmes P2P

    Get PDF
    The goal of this thesis is to contribute to the development of data localization and summarization techniques in P2P environments. At the application layer, we focus on exploiting the semantics that can be captured from the shared data. These semantics can improve the search efficiency, and allow for more query facilities. To this end, we introduce a novel data indexing technique into P2P systems that relies on linguistic summarization. Our summaries are synthetic, multidimensional views that support locating relevant data based on their content. More interestingly, they provide intelligible data representations which may return approximate answers for user queries. At the P2P network layer, we focus on exploiting the characteristics of the overlay topology, namely its clustering features, in order to reduce the traffic overhead generated by flooding-based mechanisms. This allows to improve the performance of P2P systems, irrespective of the employment of techniques relying on data semantics at the application layer. To this end, we define a cluster-based search technique which is implemented over a connectivity-based clustering protocol. A connectivity-based clustering protocol aims to discover the natural organization of nodes, based on their connectivity. Thus, it delimits the boundaries of non-overlapping subgraphs (i.e. clusters) which are loosely connected, and in which nodes are highly connected. In this thesis, we first survey P2P data sharing systems. We focus on the evolution from simple file-sharing systems with limited functionalities, to Peer Data Management Systems (PDMSs) that support advanced ap- plications with more sophisticated data management techniques. Second, we propose a solution for managing linguistic summaries in P2P systems. We define an appropriate summary model and efficient techniques for summary creation and maintenance. We also discuss query processing in the context of summaries. Third, we propose a cluster-based search technique on top of existing connectivity-based clustering protocols. We focus on reducing redundant query messages which unnecessarily overload the system. We validated our solutions through simulation and the results show good performance.Le but de cette thèse est de contribuer au développement des techniques de localisation et de description de données dans des environnements P2P. Au niveau de la couche application, nous nous concentrons sur l'exploitatoin des sémantiques qui peuvent être capturées à partir des données partagées. Ces sémantiques peuvent améliorer l'efficacité de recherche, ainsi que permettre des requêtes complexes. A cet effet, nous présentons une technique originale d'indexation de données dans les systèmes P2P qui se base sur les résumés linguistiques. Nos résumés sont des vues synthétiques et multidimensionnelles qui supportent la localisation des données pertinentes en se basant sur leur contenu. Plus intéressant, ils fournissent des représentations intelligibles de données, qui peuvent renvoyer des réponses approximatives à des requêtes d'utilisateur. Au niveau de la couche réseau P2P, nous nous concentrons sur l'exploitation des caractéristiques de la topologie, à savoir les caractéristiques de leur regroupement (clustering). Des informations sur le clustering du réseau P2P peuvent être utilisées pour réduire le trafic de réseau produit par le mécanisme de flooding. Ceci permet d'améliorer l'exécution des systèmes P2P, indépendamment de l'emploi des index de données à la couche application, puisque le mécanisme de flooding représente toujours un bloc constitutif fondamental des systèmes non structurés P2P. Dans cette thèse, nous présentons un bref état de l'art sur les systèmes P2P de partage de données P2P et nous nous concentrons sur l'évolution des systèmes simples de partages des fichiers vers des systèmes de gestion des données. En second lieu, nous proposons une solution pour la gestion des résumés de données dans des systèmes P2P. Nous définissons un modèle approprié et des techniques efficaces pour la création et la mise à jour des résumés. Nous discutons également le traitement des requêtes dans le cadre des résumés. Troisième- ment, nous proposons une technique de recherche basée sur clustering implémentée au dessus d'un protocole de custering selon la connectivité des noeuds. Nous nous concentrons sur la reduction des messages de re- quêtes redondants qui surchargent inutilement le système. Nous avons validé nos solutions par la simulation et les résultats montrent une bonne performance
    corecore